13 research outputs found
A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning
Adaptive gradient methods have become popular in optimizing deep neural
networks; recent examples include AdaGrad and Adam. Although Adam usually
converges faster, variations of Adam, for instance, the AdaBelief algorithm,
have been proposed to enhance Adam's poor generalization ability compared to
the classical stochastic gradient method. This paper develops a generic
framework for adaptive gradient methods that solve non-convex optimization
problems. We first model the adaptive gradient methods in a state-space
framework, which allows us to present simpler convergence proofs of adaptive
optimizers such as AdaGrad, Adam, and AdaBelief. We then utilize the transfer
function paradigm from classical control theory to propose a new variant of
Adam, coined AdamSSM. We add an appropriate pole-zero pair in the transfer
function from squared gradients to the second moment estimate. We prove the
convergence of the proposed AdamSSM algorithm. Applications on benchmark
machine learning tasks of image classification using CNN architectures and
language modeling using LSTM architecture demonstrate that the AdamSSM
algorithm improves the gap between generalization accuracy and faster
convergence than the recent adaptive gradient methods
A Kalman Filter Approach for Biomolecular Systems with Noise Covariance Updating
An important part of system modeling is determining parameter values,
particularly for biomolecular systems, where direct measurements of individual
parameters are typically hard. While Extended Kalman Filters have been used for
this purpose, the choice of the process noise covariance is generally unclear.
In this chapter, we address this issue for biomolecular systems using a
combination of Monte Carlo simulations and experimental data, exploiting the
dependence of the process noise covariance on the states and parameters, as
given in the Langevin framework. We adapt a Hybrid Extended Kalman Filtering
technique by updating the process noise covariance at each time step based on
estimates. We compare the performance of this framework with different fixed
values of process noise covariance in biomolecular system models, including an
oscillator model, as well as in experimentally measured data for a negative
transcriptional feedback circuit. We find that the Extended Kalman Filter with
such process noise covariance update is closer to the optimality condition in
the sense that the innovation sequence becomes white and in achieving a balance
between the mean square estimation error and parameter convergence time. The
results of this chapter may help in the use of Extended Kalman Filters for
systems where process noise covariance depends on states and/or parameters.Comment: 23 pages, 9 figure
Control Theory-Inspired Acceleration of the Gradient-Descent Method: Centralized and Distributed
Mathematical optimization problems are prevalent across various disciplines in science and engineering. Particularly in electrical engineering, convex and non-convex optimization problems are well-known in signal processing, estimation, control, and machine learning research. In many of these contemporary applications, the data points are dispersed over several sources. Restrictions such as industrial competition, administrative regulations, and user privacy have motivated significant research on distributed optimization algorithms for solving such data-driven modeling problems. The traditional gradient-descent method can solve optimization problems with differentiable cost functions. However, the speed of convergence of the gradient-descent method and its accelerated variants is highly influenced by the conditioning of the optimization problem being solved. Specifically, when the cost is ill-conditioned, these methods (i) require many iterations to converge and (ii) are highly unstable against process noise. In this dissertation, we propose novel optimization algorithms, inspired by control-theoretic tools, that can significantly attenuate the influence of the problem's conditioning.
First, we consider solving the linear regression problem in a distributed server-agent network. We propose the Iteratively Pre-conditioned Gradient-Descent (IPG) algorithm to mitigate the deleterious impact of the data points' conditioning on the convergence rate. We show that the IPG algorithm has an improved rate of convergence in comparison to both the classical and the accelerated gradient-descent methods. We further study the robustness of IPG against system noise and extend the idea of iterative pre-conditioning to stochastic settings, where the server updates the estimate based on a randomly selected data point at every iteration. In the same distributed environment, we present theoretical results on the local convergence of IPG for solving convex optimization problems. Next, we consider solving a system of linear equations in peer-to-peer multi-agent networks and propose a decentralized pre-conditioning technique. The proposed algorithm converges linearly, with an improved convergence rate than the decentralized gradient-descent. Considering the practical scenario where the computations performed by the agents are corrupted, or a communication delay exists between them, we study the robustness guarantee of the proposed algorithm and a variant of it. We apply the proposed algorithm for solving decentralized state estimation problems. Further, we develop a generic framework for adaptive gradient methods that solve non-convex optimization problems. Here, we model the adaptive gradient methods in a state-space framework, which allows us to exploit control-theoretic methodology in analyzing Adam and its prominent variants. We then utilize the classical transfer function paradigm to propose new variants of a few existing adaptive gradient methods. Applications on benchmark machine learning tasks demonstrate our proposed algorithms' efficiency. Our findings suggest further exploration of the existing tools from control theory in complex machine learning problems.
The dissertation is concluded by showing that the potential in the previously mentioned idea of IPG goes beyond solving generic optimization problems through the development of a novel distributed beamforming algorithm and a novel observer for nonlinear dynamical systems, where IPG's robustness serves as a foundation in our designs. The proposed IPG for distributed beamforming (IPG-DB) facilitates a rapid establishment of communication links with far-field targets while jamming potential adversaries without assuming any feedback from the receivers, subject to unknown multipath fading in realistic environments. The proposed IPG observer utilizes a non-symmetric pre-conditioner, like IPG, as an approximation of the observability mapping's inverse Jacobian such that it asymptotically replicates the Newton observer with an additional advantage of enhanced robustness against measurement noise. Empirical results are presented, demonstrating both of these methods' efficiency compared to the existing methodologies
Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems
Moving horizon estimation (MHE) is a widely studied state estimation approach
in several practical applications. In the MHE problem, the state estimates are
obtained via the solution of an approximated nonlinear optimization problem.
However, this optimization step is known to be computationally complex. Given
this limitation, this paper investigates the idea of iteratively preconditioned
gradient-descent (IPG) to solve MHE problem with the aim of an improved
performance than the existing solution techniques. To our knowledge, the
preconditioning technique is used for the first time in this paper to reduce
the computational cost and accelerate the crucial optimization step for MHE.
The convergence guarantee of the proposed iterative approach for a class of MHE
problems is presented. Additionally, sufficient conditions for the MHE problem
to be convex are also derived. Finally, the proposed method is implemented on a
unicycle localization example. The simulation results demonstrate that the
proposed approach can achieve better accuracy with reduced computational costs
Visualization of Multiple Genome Annotations and Alignments With the K-BROWSER
We introduce a novel genome browser application, the K-BROWSER, that allows intuitive visualization of biological information across an arbitrary number of multiply aligned genomes. In particular, the K-BROWSER simultaneously displays an arbitrary number of genomes both through overlaid annotations and predictions that describe their respective characteristics, and through the multiple alignment that describes their global relationship to one another. The browsing environment has been designed to allow users seamless access to information available in every genome and, furthermore, to allow easy navigation within and between genomes. As of the date of publication, the K-BROWSER has been set up on the human, mouse, and rat genomes